Are unequal clade priors problematic for Bayesian phylogenetics?
نویسندگان
چکیده
Although Bayesian phylogenetic methodologies were first developed in the 1960s (Felsenstein, 1968, 2004), the approach remained relatively obscure until the initial release of the software application MrBayes (Huelsenbeck and Ronquist, 2001). Since that time, the popularity of Bayesian phylogenetics has increased tremendously, and it now must be considered a primary method of analysis on par with maximum likelihood, parsimony, and distance methods. The popularity of Bayesian analysis can be attributed to computational efficiencies that allow for explicit model-based analyses of large data sets in real time with simultaneous estimation of nodal support in the form of posterior probability values. Despite the initial enthusiasm generated by the availability of a fast likelihood-based approach, Bayesian phylogenetic analysis remains somewhat controversial. Much of the controversy is focused on two related issues: (1) the relationship between posterior probability values and nonparametric bootstrap proportions with the nagging suspicion that posterior probabilities are too liberal (e.g., Suzuki et al., 2002), and (2) the influence of prior probabilities, especially so-called flat or uninformative priors, on resulting Bayesian posteriors (Felsenstein, 2004; Zwickl and Holder, 2004; Pickett and Randle, 2005). Although there has been a spate of simulation studies published during the past 2 years, most (Alfaro et al., 2003; Cummings et al. 2003; Douady et al., 2003; Erixon et al., 2003; Huelsenbeck and Rannala, 2004; Wilcox et al., 2002) have focused on the relationship between posterior probabilities and bootstrap proportions. The relative impact of priors on posteriors has only recently received the detailed study that is required to determine if current Bayesian implementations are appropriate and, if not, how they might be corrected (e.g., Zwickl and Holder, 2004; Lewis et al., 2005). Bayesian phylogenetic analysis requires the designation of prior probabilities for each parameter in the analysis including those for alternative tree topologies, branch lengths, and the nucleotide substitution model. In each case, we usually have little a priori information that would allow us to select an appropriate informative prior distribution, thus researchers generally attempt to accommodate their ignorance by applying uninformative priors. Because the posterior probability is proportional to the product of the prior probability and the likelihood, a truly uninformative prior should allow the likelihood function to drive the outcome of the analysis (Huelsenbeck et al., 2002; Lewis, 2001a; Zwickl and Holder, 2004). Unfortunately, the designation of truly uninformative priors is notoriously difficult (see Kass and Wasserman, 1996; Zwickl and Holder, 2004), and advocates proceed with the hope that the likelihood will overwhelm inappropriately informative priors when they cannot be avoided. The viability of Bayesian phylogenetics may depend on inferences being robust to these unavoidably informative priors. In a recent article, Pickett and Randle (2005; hereafter referred to as “PR” for the sake of brevity) provide one of the first investigations of the relationship between prior and posterior probabilities for Bayesian phylogenetic analysis when applying inappropriately informative priors (see also Zwickl and Holder, 2004). They correctly recognized that the designation of uninformative priors on the tree topology does not result in uninformative clade priors (we note that the prior probability distribution of clades can be viewed either as the joint distribution over all splits, or as the marginal prior distribution for each individual split. Here we are concerned with the former interpretation). This point was clearly illustrated by PR with a simple example—if one considers a fully bifurcating five-taxon tree, there are 15 reconstructions linking each possible pair of taxa and only 9 reconstructions linking any combination of three taxa. Thus, with rooted trees, the prior probability of larger and smaller clades will be greater than those on clades of intermediate size. All else being equal, the posterior probabilities of smaller and larger clades should be inflated relative to those of clades of intermediate size. PR presented two examples of this phenomenon by analyzing both empirical DNA and contrived data sets. We first focus on the contrived data because we believe these are the only results in the PR study that clearly indicate that informative
منابع مشابه
Are Nonuniform Clade Priors Important in Bayesian Phylogenetic Analysis? A Response to Brandley et al
The use and design of prior distributions that reflect prior ignorance have long been controversial in statistics. The use of any prior distribution marks the difference between Bayesian and frequentist schools of thought. Disagreements regarding the design of prior distributions to reflect ignorance, and the interpretation of posterior distributions derived from such priors, have resulted in c...
متن کاملOn the reliability of Bayesian posterior clade probabilities in phylogenetic analysis
This article discusses possible reasons why posterior clade probabilities obtained from Bayesian phylogenetic analyses might be inaccurate. It attempts to list all possible sources of uncertainty and error in Bayesian phylogenetic analysis. The choice of priors on trees has been suggested by several authors as a cause of inaccurate posterior clade probabilities. I argue strongly for using prior...
متن کاملLETTER Choice of Topology Estimators in Bayesian Phylogenetic Analysis
Wheeler WC and Pickett KM (2008. Topology-Bayes versus clade-Bayes in phylogenetic analysis. Mol Biol Evol. 25:447–453.) discuss two ways of summarizing the posterior probability distribution of a Bayesian phylogenetic analysis, which they refer to as ‘‘topology-Bayes’’ and ‘‘clade-Bayes.’’ They claim that the clade-Bayes approach leads to problems such as ‘‘exaggerated clade support, inconsist...
متن کاملBayesian Sample size Determination for Longitudinal Studies with Continuous Response using Marginal Models
Introduction Longitudinal study designs are common in a lot of scientific researches, especially in medical, social and economic sciences. The reason is that longitudinal studies allow researchers to measure changes of each individual over time and often have higher statistical power than cross-sectional studies. Choosing an appropriate sample size is a crucial step in a successful study. A st...
متن کاملAnalysis of mitochondrial DNA sequences of Turcinoemacheilus genus (Nemacheilidae Cypriniformes) in Iran
Members of Nemacheilidae Family, Turcinoemacheilus genus were subjected to molecular phylogenetic analysis in this study. This genus was reported in 2009 to inhabit in Karoon River drainage, in contrary to previous assumption that it was the endemic species in the Basin of Tigris River. It was sampled from three stations placed in different tributaries in Karoon drainage and evaluated to unders...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Systematic biology
دوره 55 1 شماره
صفحات -
تاریخ انتشار 2006